Expression monitoring to high density oligonucleotide arrays

ABSTRACT

This invention provides methods of monitoring the expression levels of a multiplicity of genes. The methods involve hybridizing a nucleic acid sample to a high density array of oligonucleotide probes where the high density array contains oligonucleotide probes complementary to subsequences of target nucleic acids in the nucleic acid sample. In one embodiment, the method involves providing a pool of target nucleic acids comprising RNA transcripts of one or more target genes, or nucleic acids derived from the RNA transcripts, hybridizing said pool of nucleic acids to an array of oligonucleotide probes immobilized on surface, where the array comprising more than 100 different oligonucleotides and each different oligonucleotide is localized in a predetermined region of the surface, the density of the different oligonucleotides is greater than about 60 different oligonucleotides per 1 cm 2 , and the olignucleotide probes are complementary to the RNA transcripts or nucleic acids derived from the RNA transcripts; and quantifying the hybridized nucleic acids in the array.

BACKGROUND OF THE INVENTION

Many disease states are characterized by differences in the expressionlevels of various genes either through changes in the copy number of thegenetic DNA or through changes in levels of transcription (e.g. throughcontrol of initiation, provision of RNA precursors, RNA processing,etc.) of particular genes. For example, losses and gains of geneticmaterial play an important role in malignant transformation andprogression. These gains and losses are thought to be “driven” by atleast two kinds of genes. Oncogenes are positive regulators oftumorgenesis, while tumor suppressor genes are negative regulators oftumorgenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, Science,254: 1138-1146 (1991)). Therefore, one mechanism of activatingunregulated growth is to increase the number of genes coding foroncogene proteins or to increase the level of expression of theseoncogenes (e.g. in response to cellular or environmental changes), andanother is to lose genetic material or to decrease the level ofexpression of genes that code for tumor suppressors. This model issupported by the losses and gains of genetic material associated withglioma progression (Mikkelson et al. J. Cellular Biochm. 46: 3-8(1991)). Thus, changes in the expression (transcription) levels ofparticular genes (e.g. oncogenes or tumor suppressors), serve assignposts for the presence and progression of various cancers.

Similarly, control of the cell cycle and cell development, as well asdiseases, are characterized by the variations in the transcriptionlevels of particular genes. Thus, for example, a viral infection isoften characterized by the elevated expression of genes of theparticular virus. For example, outbreaks of Herpes simplex, Epstein-Barrvirus infections (e.g. infectious mononucleosis), cytomegalovirus,Varicella-zoster virus infections, parvovirus infections, humanpapillomavirus infections, etc. are all characterized by elevatedexpression of various genes present in the respective virus. Detectionof elevated expression levels of characteristic viral genes provides aneffective diagnostic of the disease state. In particular, viruses suchas herpes simplex, enter quiescent states for periods of time only toerupt in brief periods of rapid replication. Detection of expressionlevels of characteristic viral genes allows detection of such activeproliferative (and presumably infective) stares.

Oligonucleotide probes have long been used to detect complementarynucleic acid sequences in a nucleic acid of interest (the “target”nucleic acid) and have been used to detect expression of particulargenes (e.g., a Northern Blot). In some assay formats, theoligonucleotide probe is tethered, i.e., by covalent attachment, to asolid support, and arrays of oligonucleotide probes immobilized on solidsupports have been used to detect specific nucleic acid sequences in atarget nucleic acid. See, e.g., PCT patent publication Nos. WO 89/10977and 89/11548. Others have proposed the use of large numbers ofoligonucleotide probes to provide the complete nucleic acid sequence ofa target nucleic acid but failed to provide an enabling method for usingarrays of immobilized probes for this purpose. See U.S. Pat. Nos.5,202,231 and 5,002,867 and PCT patent publication No. WO 93/17126.

The use of “traditional” hybridization protocols for monitoring orquantifying gene expression is problematic. For example two or more geneproducts of approximately the same molecular weight will prove difficultor impossible to distinguish in a Northern blot because they are notreadily separated by electrophoretic methods. Similarly, ashybridization efficiency and cross-reactivity varies with the particularsubsequence (region) of a gene being probed it is difficult to obtain anaccurate and reliable measure of gene expression with one, or even afew, probes to the target gene.

The development of VLSIPS™ technology provided methods for synthesizingarrays of many different oligonucleotide probes that occupy a very smallsurface area. See U.S. Pat. No. 5,143,854 and PCT patent publication No.WO 90/15070. U.S. patent application Ser. No. 082,937, filed Jun. 25,1993, describes methods for making arrays of oligonucleotide probes thatcan be used to provide the complete sequence of a target nucleic acidsaid to detect the presence of a nucleic acid containing a specificnucleotide sequence.

Prior to the present invention, however, it was unknown that highdensity oligonucleotide arrays could be used to reliably monitor messagelevels of a multiplicity of preselected genes in the presence of a largeabundance of other (non-target) nucleic acids (e.g., in a cDNA library,DNA reverse transcribed from an mRNA, mRNA used directly or amplified,or polymerized from a DNA template). In addition, the prior art providedno rapid and effective method for identifying a set of oligonucleotideprobes that maximize specific hybridization efficacy while minimizingcross-reactivity nor of using hybridization patterns (in particularhybridization patterns of a multiplicity of oligonucleotide probes inwhich multiple oligonucleotide probes are directed to each targetnucleic acid) for quantification of target nucleic acid concentrations.

SUMMARY OF THE INVENTION

The present invention is premised, in part, on the discovery thatmicrofabricated arrays of large numbers of different oligonucleotideprobes (DNA chips) may effectively be used to not only detect thepresence or absence of target nucleic acid sequences, but to quantifythe relative abundance of the target sequences in a complex nucleic acidpool. In particular, prior to this invention it was unknown thathybridization to high density probe arrays would permit small variationsin expression levels of a particular gene to be identified andquantified in a complex population of nucleic acids that out number thetarget nucleic acids by 1,000 fold to 1,000,000 fold or more.

Thus, this invention provides for a method of simultaneously monitoringthe expression (e.g. detecting and or quantifying the expression) of amultiplicity of genes. The levels of transcription for virtually anynumber of genes may be determined simultaneously. Typically, at leastabout 10 genes, preferably at least about 100, more preferably at leastabout 1000 and most preferably at least about 10,000 different genes areassayed at one time.

The method involves providing a pool of target nucleic acids comprisingmRNA transcripts of one or more of said genes, or nucleic acids derivedfrom the mRNA transcripts; hybridizing the pool of nucleic acids to anarray of oligonucleotide probes immobilized on a surface, where thearray comprises more than 100 different oligonucleotides, each differentoligonucleotide is localized in a predetermined region of said surface,the density of the different oligonucleotides is greater than about 60different oligonucleotides per 1 cm², and the olignucleotide probes arecomplementary to the mRNA transcripts or nucleic acids derived from themRNA transcripts; and quantifying the hybridized nucleic acids in thearray. In a preferred embodiment, the pool of target nucleic acids isone in which the concentration of the target nucleic acids (mRNAtranscripts or nucleic acids derived from the mRNA transcripts) isproportional to the expression levels of genes encoding those targetnucleic acids.

In a preferred embodiment, the array of oligonucleotide probes is a highdensity array comprising greater than about 100, preferably greater thanabout 1,000 more preferably greater than about 16,000 and mostpreferably greater than about 65,000 or 250,000 or even 1,000,000different oligonucleotide probes. Such high density arrays comprise aprobe density of generally greater than about 60, more generally greaterthan about 100, most generally greater than about 600, often greatergreater than about 1000, more often greater than about 5,000, most oftengreater than about 10,000, preferably greater than about 40,000 morepreferably greater than about 100,000, and most preferably greater thanabout about 400,000 different oligonucleotide probes per cm². Theoligonucleotide probes range from about 5 to about 50 nucleotides, morepreferably from about 10 to about 40 nucleotides and most preferablyfrom about 15 to about 40 nucleotides in length. The array may comprisemore than 10, preferably more than 50, more preferably more than 100,and most preferably more than 1000 oligonucleotide probes specific foreach target gene. Although a planar array surface is preferred, thearray may be fabricated on a surface of virtually any shape or even amultiplicity of surfaces.

The array may further comprise mismatch control probes. Where suchmismatch controls are present, the quantifying step may comprisecalculating the difference in hybridization signal intensity betweeneach of the oligonucleotide probes and its corresponding mismatchcontrol probe. The quantifying may further comprise calculating theaverage difference in hybridization signal intensity between each of theoligonucleotide probes and its corresponding mismatch control probe foreach gene.

The probes present in the high density array can be oligonucleotideprobes selected according to the optimization methods described below.Alternatively, non-optimal probes may be included in the array, but theprobes used for quantification (analysis) can be selected according tothe optimization methods described below.

Oligonucleotide arrays for the practice of this invention are preferablysynthesized by light-directed very large scaled immobilized polymersynthesis (VLSIPS) as described herein. The array includes test probeswhich are oligonucleotide probes each of which has a sequence that iscomplementary to a subsequence of one of the genes (or the mRNA or thecorresponding antisense cRNA) whose expression is to be detected. Inaddition, the array can contain normalization controls, mismatchcontrols and expression level controls as described herein.

The pool of nucleic acids may be labeled before, during, or afterhybridization, although in a preferred embodiment, the nucleic acids arelabeled before hybridization. Fluorescence labels are particularlypreferred and, where used, quantification of the hybridized nucleicacids is by quantification of fluorescence from the hybridizedfluorescently labeled nucleic acid. Such quantification is facilitatedby the use of a fluorescence microscope which can be equipped with anautomated stage to permit automatic scanning of the array, and which canbe equipped with a data acquisition system for the automated measurementrecording and subsequent processing of the fluorescence intensityinformation.

In a preferred embodiment, hybridization is at low stringency (e.g.about 20° C. to about 50° C., more preferably about 30° C. to about 40°C., and most preferably about 37° C. and 6×SSPE-T or lower) with atleast one wash at higher stringency. Hybridization may includesubsequent washes at progressively increasing stringency until a desiredlevel of hybridization specificity is reached.

The pool of target nucleic acids can be the total polyA⁺ mRNA isolatedfrom a biological sample, or cDNA made by reverse transcription of theRNA or second strand cDNA or RNA transcribed from the double strandedcDNA intermediate. Alternatively, the pool of target nucleic acids canbe treated to reduce the complexity of the sample and thereby reduce thebackground signal obtained in hybridization. In one approach, a pool ofmRNAs, derived from a biological sample, is hybridized with a pool ofoligonucleotides comprising the oligonucleotide probes present in thehigh density array. The pool of hybridized nucleic acids is then treatedwith RNase A which digests the single stranded regions. The remainingdouble stranded hybridization complexes are then denatured and theoligonucleotide probes are removed, leaving a pool of mRNAs enhanced forthose mRNAs complementary to the oligonucleotide probes in the highdensity array.

In another approach to background reduction, a pool of mRNAs derivedfrom a biological sample is hybridized with paired target specificoligonucleotides where the paired target specific oligonucleotides arecomplementary to regions flanking subsequences of the mRNAscomplementary to the oligonucleotide probes in the high density array.The pool of hybridized nucleic acids is treated with RNase H whichdigests the hybridized (double stranded) nucleic acid sequences. Theremaining single stranded nucleic acid sequences which have a lengthabout equivalent to the region flanked by the paired target specificoligonucleotides are then isolated (e.g. by electrophoresis) and used asthe pool of nucleic acids for monitoring gene expression.

Finally, a third approach to background reduction involves eliminatingor reducing the representation in the pool of particular preselectedtarget mRNA messages (e.g., messages that are characteristicallyoverexpressed in the sample). This method involves hybridizing anoligonucleotide probe that is complementary to the preselected targetmRNA message to the pool of polyA⁺ mRNAs derived from a biologicalsample. The oligonucleotide probe hybridizes with the particularpreselected polyA⁺ mRNA (message) to which it is complementary. The poolof hybridized nucleic acids is treated with RNase H which digests thedouble stranded (hybridized) region thereby separating the message fromits polyA⁺ tail. Isolating or amplifying (e.g., using an oligo dTcolumn) the polyA⁺ mRNA in the pool then provides a pool having areduced or no representation of the preselected target mRNA message.

It will be appreciated that the methods of this invention can be used tomonitor (detect and/or quantify) the expression of any desired gene ofknown sequence or subsequence. Moreover, these methods permit monitoringexpression of a large number of genes simultaneously and effectsignificant advantages in reduced labor, cost and time. The simultaneousmonitoring of the expression levels of a multiplicity of genes permitseffective comparison of relative expression levels and identification ofbiological conditions characterized by alterations of relativeexpression levels of various genes. Genes of particular interest forexpression monitoring include genes involved in the pathways associatedwith various pathological conditions (e.g., cancer) and whose expressionis thus indicative of the pathological condition. Such genes include,but are not limited to the HER2 (c-erbB-2/neu) proto-oncogene in thecase of breast cancer, receptor tyrosine kinases (RTKs) associated withthe etiology of a number of tumors including carcinomas of the breast,liver, bladder, pancreas, as well as glioblastomas, sarcomas andsquamous carcinomas, and tumor suppressor genes such as the P53 gene andother “marker” genes such as RAS, MSH2, MLH1 and BRCA1. Other genes ofparticular interest for expression monitoring are genes involved in theimmune response (e.g., interleukin genes), as well as genes involved incell adhesion (e.g., the integrins or selectins) and signal transduction(e.g., tyrosine kinases), etc.

In another embodiment, this invention provides for a method of selectinga set of oligonucleotide probes, that specifically bind to a targetnucleic acid (e.g., a gene or genes whose expression is to be monitoredor nueleic acids derived from the gene or its transcribed mRNA). Themethod involves providing a high density array of oligonucleotide probeswhere the array comprises a multiplicity of probes wherein each probe iscomplementary to a subsequence of the target nucleic acid. The targetnucleic acid is then hybridized to the array of oligonucleotide probesto identify and select those probes where the difference inhybridization signal intensity between each probe and its mismatchcontrol is detectable (preferably greater than about 10% of thebackground signal intensity, more preferably greater than about 20% ofthe background signal intensity and most preferably greater than about50% of the background signal intensity). The method can further comprisehybridizing the array to a second pool of nucleic acids comprisingnucleic acids other than the target nucleic acids; and identifying andselecting probes having the lowest hybridization signal and where boththe probe and its mismatch control have a hybridization intensity equalto or less than about 5 times the background signal intensity,preferably equal to or less than about 2 times the background signalintensity, more preferably equal to or less than about 1 times thebackground signal intensity, and most preferably equal or less thanabout half the background signal intensity.

In a preferred embodiment, the multiplicity of probes can include everydifferent probe of length n that is complementary to a subsequence ofthe target nucleic acid. The probes can range from about 10 to about 50nucleotides in length. The array is preferably a high density array asdescribed above. Similarly, the hybridization methods, conditions,times, fluid volumes, detection methods are as described above andherein below.

In addition, this invention provides for a composition comprising anarray of oligonucleotide probes immobilized on a substrate, where thearray comprises more than 100 different oligonucleotides and eachdifferent oligonucleotide is localized in a predetermined region of thesolid support and the density of the array is greater than about 60different oligonucleotides per 1 cm² of substrate. The oligonucleotideprobes are specifically hybridized to one or more fluorescently labelednucleic acids such that the fluorescence in each region of the array isindicative of the level of expression of each of a multiplicity ofpreselected genes. The array is preferably a high density array asdescribed above and may further comprise expression level controls,mismatch controls and normalization controls as described herein.

Finally, this invention provides for kits for simultaneously monitoringexpression levels of a multiplicity of genes. The kits include an arrayof immobilized oligonucleotide probes complementary to subsequences ofthe multiplicity of target genes, as described above. In one embodiment,the array comprises at least 100 different oligonucleotide probes andthe density of the array is greater than about 60 differentoligonucleotides per 1 cm² of surface. The kit may also includeinstructions describing the use of the array for detection and/orquantification of expression levels of the multiplicity of genes. Thekit may additionally include one or more of the following: buffers,hybridization mix, wash and read solutions, labels, labeling reagents(enzymes etc.), “control” nucleic acids, software for probe selection,array reading or data analysis and any of the other materials orreagents described herein for the practice of the claimed methods.

Definitions.

The phrase “massively parallel screening” refers to the simultaneousscreening of at least about 100, preferably about 1000, more preferablyabout 10,000 and most preferably about 1,000,000 different nucleic acidhybridizations.

The terms “nucleic acid” or “nucleic acid molecule” refer to adeoxyribonucleotide or ribonucleotide polymer in either single- ordouble-stranded form, and unless otherwise limited, would encompassknown analogs of natural nucleotides that can function in a similarmanner as naturally occurring nucleotides.

An oligonucleotide is a single-stranded nucleic acid ranging in lengthfrom 2 to about 500 bases.

As used herein a “probe” is defiled as an oligonucleotide capable ofbinding to a target nucleic acid of complementary sequence through oneor more types of chemical bonds, usually through complementary basepairing, usually through hydrogen bond formation. As used herein, anoligonucleotide probe may include natural (ie. A, G, C, or T) ormodified bases (7-deazaguanosine, inosine, etc.). In addition, the basesin oligonucleotide probe may be joined by a linkage other than aphosphodiester bond, so long as it does not interfere withhybridization. Thus, oligonucleotide probes may be peptide nucleic acidsin which the constituent bases are joined by peptide bonds rather thanphosphodiester linkages.

The term “target nucleic acid” refers to a nucleic acid (often derivedfrom a biological sample), to which the oligonucleotide probe isdesigned to specifically hybridize. It is either the presence or absenceof the target nucleic acid that is to be detected, or the amount of thetarget nucleic acid that is to be quantified. The target nucleic acidhas a sequence that is complementary to the nucleic acid sequence of thecorresponding probe directed to the target. The term target nucleic acidmay refer to the specific subsequence of a larger nucleic acid to whichthe probe is directed or to the overall sequence (e.g., gene or mRNA)whose expression level it is desired to detect. The difference in usagewill be apparent from context.

“Subsequence” refers to a sequence of nucleic acids that comprise a partof a longer sequence of nucleic acids.

The term “complexity” is used here according to standard meaning of thisterm as established by Britten et al. Methods of Enzymol. 29:363 (1974).See, also Cantor and Schimmel Biophysical Chemistry: Part III at1228-1230 for further explanation of nucleic acid complexity.

“Bind(s) substantially” refers to complementary hybridization between aprobe nucleic acid and a target nucleic acid and embraces minormismatches that can be accommodated by reducing the stringency of thehybridization media to achieve the desired detection of the targetpolynucleotide sequence.

The phrase “hybridizing specifically to”, refers to the binding,duplexing, or hybridizing of a molecule only to a particular nucleotidesequence under stringent conditions when that sequence is present in acomplex mixture (e.g., total cellular) DNA or RNA. The term “stringentconditions” refers to conditions under which a probe will hybridize toits target subsequence, but to no other sequences. Stringent conditionsare sequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (Tm) for the specific sequence at adefined ionic strength and pH. The Tm is the temperature (under definedionic strength, pH, and nucleic acid concentration) at which 50% of theprobes complementary to the target sequence hybridize to the targetsequence at equilibrium. (As the target sequences are generally presentin excess, at Tm, 50% of the probes are occupied at equilibrium).Typically, stringent conditions will be those in which the saltconcentration is at least about 0.01 to 1.0 M Na ion concentration (orother salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C. for short probes (e.g., 10 to 50 nucleotides). Stringent conditionsmay also be achieved with the addition of destabilizing agents such asformamide.

The term “mismatch control” refers to a probe that has a sequencedeliberately selected not to be perfectly complementary to a particulartarget sequence. The mismatch control typically has a corresponding testprobe that is perfectly complementary to the same particular targetsequence. The mismatch may comprise one or more bases. While themismatch(s) may be locates anywhere in the mismatch probe, terminalmismatches are less desirable as a terminal mismatch is less likely toprevent hybridization of the target sequence. In a particularlypreferred embodiment, the mismatch is located at or near the center ofthe probe such that the mismatch is most likely to destabilize theduplex with the target sequence under the test hybridization conditions.

The terms “background” or “background signal intensity” refer tohybridization signals resulting from non-specific binding, or otherinteractions, between the labeled target nucleic acids and components ofthe oligonucleotide array (e.g., the oligonucleotide probes, controlprobes, the array substrate, etc.). Background signals may also beproduced by intrinsic fluorescence of the array components themselves. Asingle background signal can be calculated for the entire array, or adifferent background signal may be calculated for each target nucleicacid. In a preferred embodiment, background is calculated as the averagehybridization signal intensity for the lowest 5% to 10% of the probes inthe array, or, where a different background signal is calculated foreach target gene, for the lowest 5% to 10% of the probes for each gene.Of course, one of skill in the art will appreciate that where the probesto a particular gene hybridize well and thus appear to be specificallybinding to a target sequence, they should not be used in a backgroundsignal calculation. Alternatively, background may be calculated as theaverage hybridization signal intensity produced by hybridization toprobes that are not complementary to any sequence found in the sample(e.g. probes directed to nucleic acids of the opposite sense or to genesnot found in the sample such as bacterial genes where the sample ismammalian nucleic acids). Background can also be calculated as theaverage signal intensity produced by regions of the array that lack anyprobes at all.

The term “quantifying” when used in the context of quantifyingtranscription levels of a gene can refer to absolute or to relativequantification. Absolute quantification may be accomplished by inclusionof known concentration(s) of one or more target nucleic acids (e.g.control nucleic acids such as Bio B or with known amounts the targetnucleic acids themselves) and referencing the hybridization intensity ofunknowns with the known target nucleic acids (e.g. through generation ofa standard curve). Alternatively, relative quantification can beaccomplished by comparison of hybridization signals between two or moregenes, or between two or more treatments to quantify the changes inhybridization intensity and, by implication, transcription level.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a plot of hybridization intensity plotted as a function ofconcentration of target mRNA. Graphs A and B show the hybridizationintensity of IL-4 RNA hybridized to the high density array of Example 1.Graph B expands the ordinate of graph A to show the low concentrationvalues. Graphs C and D show hybridization intensity plotted as afunction of target RNA for a collection of different target RNAs. Thegraphs show the average values of the 1000 highest intensity probes.Graph D expands the ordinate of graph C to show the low concentrationvalues.

FIG. 2 shows a plot of hybridization intensity for mouse library RNA,mouse library RNA spiked with mCTLA8, IL-6, IL-3, IFN-γ, and IL-12p40 at10 pM or 50 pM. The data presented is based upon approximately the best(optimal) 10% of the probes to each gene, where the optimal probes areselected according to the method disclosed herein.

FIG. 3 shows a plot of the data from Example 1 (FIG. 2) with theordinate condensed to show the constitutively expressed GAPDH and Actingenes and the intrinsic expressed IL-10 gene.

DETAILED DESCRIPTION

This invention provides methods of monitoring (detecting and/orquantifying) the expression levels of one or more genes. The methodsinvolve hybridization of a nucleic acid target sample to a high densityarray of nucleic acid probes and then quantifying the amount of targetnucleic acids hybridized to each probe in the array.

While nucleic acid hybridization has been used for some time todetermine the expression levels of various genes (e.g., Northern Blot),it was a surprising discovery of this invention that high density arraysare suitable for the quantification of the small variations inexpression (transcription) levels of a gene in the presence of a largepopulation of heterogenous nucleic acids. The signal may be present at aconcentration of less than about 1 in 1,000, and is often present at aconcentration less than 1 in 10,000 more preferably less than about 1 in50,000 and most preferably less than about 1 in 100,000 or even 1 in1,000,000.

Prior to this invention, it was expected that hybridization of such acomplex mixture to a high density array might overwhelm the availableprobes and make it impossible to detect the presence of low-level targetnucleic acids. It was thus unclear that a low level signal could beisolated and detected in the presence of misleading signals due tocross-hybridization and non-specific binding both to substrate andprobe.

It was a surprising discovery that, to the contrary, high density arraysare particularly well suited for monitoring expression of a multiplicityof genes and provide a level of sensitivity and discrimination hithertounexpected.

Preferred high density arrays of this invention comprise greater thanabout 100, preferably greater than about 1000, more preferably greaterthan about 16,000 and most preferably greater than about 65,000 or250,000 or even greater than about 1,000,000 different oligonucleotideprobes. The oligonucleotide probes range from about 5 to about 50nucleotides, more preferably from about 10 to about 40 nucleotides andmost preferably from about 15 to about 40 nucleotides in length.

The location and sequence of each different oligonucleotide probesequence in the array is known. Moreover, the large number of differentprobes occupies a relatively small area providing a high density arrayhaving a probe density of generally greater than about 60, moregenerally greater than about 100, most generally greater than about 600,often greater greater than about 1000, more often greater than about5,000, most often greater than about 10,000, preferably greater thanabout 40,000 more preferably greater than about 100,000, and mostpreferably greater than about about 400,000 different oligonucleotideprobes per cm². The small surface area of the array (often less thanabout 10 cm², preferably less than about 5 cm² more preferably less thanabout 2 cm², and most preferably less than about 1.6 cm²) permitsextremely uniform hybridization conditions (temperature regulation, saltcontent, etc.) while the extremely large number of probes allowsmassively parallel processing of hybridizations.

It was a discovery of this invention that the use of high density arraysfor expression monitoring provides a number of advantages not found withother methods. For example, the use of large numbers of different probesthat specifically bind to the transcription product of a particulartarget gene provides a high degree of redundancy and internal controlthat permits optimization of probe sets for effective detection ofparticular target genes and minimizes the possibility of errors due tocross-reactivity with other nucleic acid species.

Apparently suitable probes often prove ineffective for expressionmonitoring by hybridization. For example, certain subsequences of aparticular target gene may be found in other regions of the genome andprobes directed to these subsequences will cross-hybridize with theother regions and not provide a signal that is a meaningful measure ofthe expression level of the target gene. Even probes that show littlecross reactivity may be unsuitable because they generally show poorhybridization due to the formation of structures that prevent effectivehybridization. Finally, in sets with large numbers of probes, it isdifficult to identify hybridization conditions that are optimal for allthe probes in a set. Because of the high degree of redundancy providedby the large number of probes for each target gene, it is possible toeliminate those probes that function poorly under a given set ofhybridization conditions and still retain enough probes to a particulartarget gene to provide an extremely sensitive and reliable measure ofthe expression level (transcription level) of that gene.

In addition, the use of large numbers of different probes to each targetgene makes it possible to monitor expression of families ofclosely-related nucleic acids. The probes may be selected to hybridizeboth with subsequences that are conserved across the family and withsubsequences that differ in the different nucleic acids in the family.Thus, hybridization with such arrays permits simultaneous monitoring ofthe various members of a gene family even where the various genes areapproximately the same size and have high levels of homology. Suchmeasurements are difficult or impossible with traditional hybridizationmethods.

Because the high density arrays contain such a large number of probes itis possible to provide numerous controls including, for example,controls for variations or mutations in a particular gene, controls foroverall hybridization conditions, controls for sample preparationconditions, controls for metabolic activity of the cell from which thenucleic acids are derived and mismatch controls for non-specific bindingor cross hybridization.

Finally, because of the small area occupied by the high density arrays,hybridization may be carried out in extremely small fluid volumes (e.g.,250 μl or less, more preferably 100 μl or less, and most preferably 10μl or less). In small volumes, hybridization may proceed very rapidly.In addition, hybridization conditions are extremely uniform throughoutthe sample, and the hybridization format is amenable to automatedprocessing.

This invention demonstrates that hybridization with high densityoligonucleotide probe arrays provides an effective means of monitoringexpression of a multiplicity of genes. In addition this inventionprovides for methods of sample treatment and array designs and methodsof probe selection that optimize signal detection at extremely lowconcentrations in complex nucleic acid mixtures.

The expression monitoring methods of this invention may be used in awide variety of circumstances including detection of disease,identification of differential gene expression between two samples(e.g., a pathological as compared to a healthy sample), screening forcompositions that upregulate or downregulate the expression ofparticular genes, and so forth.

In one preferred embodiment, the methods of this invention are used tomonitor the expression (transcription) levels of nucleic acids whoseexpression is altered in a disease state. For example, a cancer may becharacterized by the overexpression of a particular marker such as theHER2 (c-erbB-2/neu) proto-oncogene in the case of breast cancer.Similarly, overexpression of receptor tyrosine kinases (RTKs) isassociated with the etiology of a number of tumors including carcinomasof the breast, liver, bladder, pancreas, as well as glioblastornas,sarcomas and squamous carcinomas (see Carpenter, Ann. Rev. Biochem., 56:881-914 (1987)). Conversely, a cancer (e.g., colerectal, lung andbreast) may be characterized by the mutation of or underexpression of atumor suppressor gene such as P53 (see, e.g., Tominaga et al. CriticalRev. in Oncogenesis, 3: 257-282 (1992)).

The materials and methods of this invention are typically used tomonitor the expression of a multiplicity of different genessimultaneously. Thus, in one embodiment, the invention provide forsimultaneous monitoring of at least about 10, preferably at least about100, more preferably at least about 1000 and most preferably at leastabout 10,000 different genes.

I. Methods of Monitoring Gene Expression.

Generally the methods of monitoring gene expression of this inventioninvolve (1) providing a pool of target nucleic acids comprising RNAtranscript(s) of one or more target gene(s), or nucleic acids derivedfrom the RNA transcript(s); (2) hybridizing the nucleic acid sample to ahigh density array of probes (including control probes); and (3)detecting the hybridized nucleic acids and calculating a relativeexpression (transcription) level.

A) Providing a Nucleic Acid Sample.

One of skill in the art will appreciate that in order to measure thetranscription level (and thereby the expression level) of a gene orgenes, it is desirable to provide a nucleic acid sample comprising mRNAtranscript(s) of the gene or genes, or nucleic acids derived from themRNA transcript(s). As used herein, a nucleic acid derived from an mRNAtranscript refers to a nucleic acid for whose synthesis the mRNAtranscript or a subsequence thereof has ultimately served as a template.Thus, a cDNA reverse transcribed from an mRNA, an RNA transcribed fromthat cDNA, a DNA amplified from the cDNA, an RNA transcribed from theamplified DNA, etc., are all derived from the mRNA transcript anddetection of such derived products is indicative of the presence and/orabundance of the original transcript in a sample. Thus, suitable samplesinclude, but are not limited to, mRNA transcripts of the gene or genes,cDNA reverse transcribed from the mRNA, cRNA transcribed from the cDNA,DNA amplified from the genes, RNA transcribed from amplified DNA, andthe like.

In a particularly preferred embodiment, where it is desired to quantifythe transcription level (and thereby expression) of a one or more genesin a sample, the nucleic acid sample is one in which the concentrationof the mRNA transcript(s) of the gene or genes, or the concentration ofthe nucleic acids derived from the mRNA transcript(s), is proportionalto the transcription level (and therefore expression level) of thatgene. Similarly, it is preferred that the hybridization signal intensitybe proportional to the amount of hybridized nucleic acid. While it ispreferred that the proportionality be relatively strict (e.g., adoubling in transcription rate results in a doubling in mRNA transcriptin the sample nucleic acid pool and a doubling in hybridization signal),one of skill will appreciate that the proportionality can be morerelaxed and even non-linear. Thus, for example, an assay where a 5 folddifference in concentration of the target mRNA results in a 3 to 6 folddifference in hybridization intensity is sufficient for most purposes.Where more precise quantification is required appropriate controls canbe run to correct for variations introduced in sample preparation andhybridization as described herein. In addition, serial dilutions of“standard” target mRNAs can be used to prepare calibration curvesaccording to methods well known to those of skill in the art. Of course,where simple detection of the presence or absence of a transcript isdesired, no elaborate control or calibration is required.

In the simplest embodiment, such a nucleic acid sample is the total mRNAisolated from a biological sample. The term “biological sample”, as usedherein, refers to a sample obtained from an organism or from components(e.g., cells) of an organism. The sample may be of any biological tissueor fluid. Frequently the sample will be a “clinical sample” which is asample derived from a patient. Such samples include, but are not limitedto, sputum, blood, blood cells (e.g., white cells), tissue or fineneedle biopsy samples, urine, peritoneal fluid, and pleural fluid, orcells therefrom. Biological samples may also include sections of tissuessuch as frozen sections taken for histological purposes.

The nucleic acid (either genomic DNA or mRNA) may be isolated from thesample according to any of a number of methods well known to those ofskill in the art. One of skill will appreciate that where alterations inthe copy number of a gene are to be detected genomic DNA is preferablyisolated. Conversely, where expression levels of a gene or genes are tobe detected, preferably RNA (mRNA) is isolated.

Methods of isolating total mRNA are well known to those of skill in theart. For example, methods of isolation and purification of nucleic acidsare described in detail in Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology: Hybridization With Nucleic AcidProbes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed.Elsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology: Hybridization With Nucleic AcidProbes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed.Elsevier, N.Y. (1993)).

In a preferred embodiment, the total nucleic acid is isolated from agiven sample using, for example, an acid guanidinium-phenol-chloroformextraction method and polyA⁺ mRNA is isolated by oligo dT columnchromatography or by using (dT)n magnetic beads (see, e.g., Sambrook etal., Molecular Cloning: A Laboratory Manual (2nd ed.), Vols. 1-3, ColdSpring Harbor Laboratory, (1989), or Current Protocols in MolecularBiology, F. Ausubel et al., ed. Greene Publishing andWiley-Interscience, New York (1987)).

Frequently, it is desirable to amplify the nucleic acid sample prior tohybridization. One of skill in the art will appreciate that whateveramplification method is used, if a quantitative result is desired, caremust be taken to use a method that maintains or controls for therelative frequencies of the amplified nucleic acids.

Methods of “quantitative” amplification are well known to those of skillin the art. For example, quantitative PCR involves simultaneouslyco-amplifying a known quantity of a control sequence using the sameprimers. This provides an internal standard that may be used tocalibrate the PCR reaction. The high density array may then includeprobes specific to the internal standard for quantification of theamplified nucleic acid.

One preferred internal standard is a synthetic AW106 cRNA. The AW106cRNA is combined with RNA isolated from the sample according to standardtechniques known to those of skill in the art. The RNA is then reversetranscribed using a reverse transcriptase to provide copy DNA. The cDNAsequences are then amplified (e.g., by PCR) using labeled primers. Theamplification products are separated, typically by electrophoresis, andthe amount of radioactivity (proportional to the amount of amplifiedproduct) is determined. The amount of mRNA in the sample is thencalculated by comparison with the signal produced by the known AW106 RNAstandard. Detailed protocols for quantitative PCR are provided in PCRProtocols, A Guide to Methods and Applications, Innis et al., AcademicPress, Inc. N.Y., (1990).

Other suitable amplification methods include, but are not limited topolymerase chain reaction (PCR) (Innis, et al., PCR Protocols. A guideto Methods and Application. Academic Press, Inc. San Diego, (1990)),ligase chain reaction (LCR) (see Wu and Wallace, Genomics, 4: 560(1989), Landegren, et al., Science, 241: 1077 (1988) and Barringer, etal., Gene, 89: 117 (1990), transcription amplification (Kwoh, et al.,Proc. Natl. Acad. Sci. USA, 86: 1173 (1989)), and self-sustainedsequence replication (Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87:1874 (1990)).

In a particularly preferred embodiment, the sample mRNA is reversetranscribed with a reverse transcriptase and a promer consisting ofoligo dT and a sequence encoding the phage T7 promoter to provide singlestranded DNA template. The second DNA strand is polymerized using a DNApolymerase. After synthesis of double-stranded cDNA, T7 RNA polymeraseis added and RNA is transcribed from the cDNA template. Successiverounds of transcription from each single cDNA template results inamplified RNA. Methods of in vitro polymerization are well known tothose of skill in the art (see, e.g., Sambrook, supra.) and thisparticular method is described in detail by Van Gelder, et al., Proc.Natl. Acad. Sci. USA, 87: 1663-1667 (1990) who demonstrate that in vitroamplification according to this method preserves the relativefrequencies of the various RNA transcripts. Moreover, Eberwine et al.Proc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol that usestwo rounds of amplification via in vitro transcription to achievegreater than 10⁶ fold amplification of the original starting materialthereby permiting expression monitoring even where biological samplesare limited.

It will be appreciated by one of skill in the art that the directtranscription method described above provides an antisense (aRNA) pool.Where antisense RNA is used as the target nucleic acid, theoligonucleotide probes provided in the array are chosen to becomplementary to subsequences of the antisense nucleic acids.Conversely, where the target nucleic acid pool is a pool of sensenucleic acids, the oligonucleotide probes are selected to becomplementary to subsequences of the sense nucleic acids. Finally, wherethe nucleic acid pool is double stranded, the probes may be of eithersense as the target nucleic acids include both sense and antisensestrands.

The protocols cited above include methods of generating pools of eithersense or antisense nucleic acids. Indeed, one approach can be used togenerate either sense or antisense nucleic acids as desired. Forexample, the cDNA can be directionally cloned into a vector (e.g.,Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked bythe T3 and T7 promoters. In vitro transcription with the T3 polymerasewill produce RNA of one sense (the sense depending on the orientation ofthe insert), while in vitro transcription with the T7 polymerase willproduce RNA having the opposite sense. Other suitable cloning systemsinclude phage lamda vectors designed for Cre-loxP plasmid subcloning(see e.g., Palazzolo et al., Gene, 88: 25-36 (1990)).

In a particularly preferred embodiment, a high activity RNA polymerase(e.g. about 2500 units/μL for T7, available from Epicentre Technologies)is used.

B) Labeling Nucleic Acids.

In a preferred embodiment, the hybridized nucleic acids are detected bydetecting one or more labels attached to the sample nucleic acids. Thelabels may be incorporated by any of a number of means well known tothose of skill in the art. However, in a preferred embodiment, the labelis simultaneously incorporated during the amplification step in thepreparation of the sample nucleic acids. Thus, for example, polymerasechain reaction (PCR) with labeled primers or labeled nucleotides willprovide a labeled amplification product. In a preferred embodiment,transcription amplification, as described above, using a labelednucleotide (e.g. fluorescein-labeled UTP and/or CTP) incorporates alabel into the transcribed nucleic acids.

Alternatively, a label may be added directly to the original nucleicacid sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the amplificationproduct after the amplification is completed. Means of attaching labelsto nucleic acids are well known to those of skill in the art andinclude, for example nick translation or end-labeling (e.g. with alabeled RNA) by kinasing of the nucleic acid and subsequent attachment(ligation) of a nucleic acid linker joining the sample nucleic acid to alabel (e.g., a fluorophore).

Detectable labels suitable for use in the present invention include anycomposition detectable by spectroscopic, photochemical, biochemical,immunochemical, electrical, optical or chemical means. Useful labels inthe present invention include biotin for staining with labeledstreptavidin conjugate, magnetic beads (e.g., Dynabeads™), fluorescentdyes (e.g., fluorescein, texas red, rhodamine, green fluorescentprotein, and the like), radiolabels (e.g., ³H, ¹²⁵I, ³⁵S, ¹⁴C, or ³²P),enzymes (e.g., horse radish peroxidase, alkaline phosphatase and otherscommonly used in an ELISA), and colorimetric labels such as colloidalgold or colored glass or plastic (e.g., polystyrene, polypropylene,latex, etc.) beads. Patents teaching the use of such labels include U.S.Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437;4,275,149; and 4,366,241.

Means of detecting such labels are well known to those of skill in theart. Thus, for example, radiolabels may be detected using photographicfilm or scintillation counters, fluorescent markers may be detectedusing a photodetector to detect emitted light. Enzymatic labels aretypically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and calorimetric labels are detected by simplyvisualizing the colored label.

The label may be added to the target (sample) nucleic acid(s) prior to,or after the hybridization. So called “direct labels” are detectablelabels that are directly attached to or incorporated into the target(sample) nucleic acid prior to hybridization. In contrast, so called“indirect labels” are joined to the hybrid duplex after hybridization.Often, the indirect label is attached to a binding moiety that has beenattached to the target nucleic acid prior to the hybridization. Thus,for example, the target nucleic acid may be biotinylated before thehybridization. After hybridization, an aviden-conjugated fluorophorewill bind the biotin bearing hybrid duplexes providing a label that iseasily detected. For a detailed review of methods of labeling nucleicacids and detecting labeled hybridized nucleic acids see LaboratoryTechniques in Biochemistry and Molecular Biology, Vol. 24: HybridizationWith Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)).

Fluorescent labels are preferred and easily added during an in vitrotranscription reaction. In a preferred embodiment, fluorescein labeledUTP and CTP are incorporated into the RNA produced in an in vitrotranscription reaction as described above.

C) Modifying Sample to Improve Signal/Noise Ratio.

The nucleic acid sample may be modified prior to hybridization to thehigh density probe array in order to reduce sample complexity therebydecreasing background signal and improving sensitivity of themeasurement. In one embodiment, complexity reduction is achieved byselective degradation of background mRNA. This is accomplished byhybridizing the sample mRNA (e.g., polyA⁺ RNA) with a pool of DNAoligonucleotides that hybridize specifically with the regions to whichthe probes in the array specifically hybridize. In a preferredembodiment, the pool of oligonucleotides consists of the same probeoligonucleotides as found on the high density array.

The pool of oligonucleotides hybridizes to the sample mRNA forming anumber of double stranded (hybrid duplex) nucleic acids. The hybridizedsample is then treated with RNase A, a nuclease that specificallydigests single stranded RNA. The RNase A is then inhibited, using aprotease and/or commercially available RNase inhibitors, and the doublestranded nucleic acids are then separated from the digested singlestranded RNA. This separation may be accomplished in a number of wayswell known to those of skill in the art including, but not limited to,electrophoresis, and gradient centrifugation. However, in a preferredembodiment, the pool of DNA oligonucleotides is provided attached tobeads forming thereby a nucleic acid affinity column. After digestionwith the RNase A, the hybridized DNA is removed simply by denaturing(e.g., by adding heat or increasing salt) the hybrid duplexes andwashing the previously hybridized mRNA off in an elution buffer.

The undigested mRNA fragments which will be hybridized to the probes inthe high density array are then preferably end-labeled with afluorophore attached to an RNA linker using an RNA ligase. Thisprocedure produces a labeled sample RNA pool in which the nucleic acidsthat do not correspond to probes in the array are eliminated and thusunavailable to contribute to a background signal.

Another method of reducing sample complexity involves hybridizing themRNA with deoxyoligonucleotides that hybridize to regions that border oneither size the regions to which the high density array probes aredirected. Treatment with RNAse H selectively digests the double stranded(hybrid duplexes) leaving a pool of single-stranded mRNA correspondingto the short regions (e.g., 20 mer) that were formerly bounded by thedeoxyolignucleotide probes and which correspond to the targets of thehigh density array probes and longer mRNA sequences that correspond toregions between the targets of the probes of the high density array. Theshort RNA fragments are then separated from the long fragments (e.g., byelectrophoresis), labeled if necessary as described above, and then areready for hybridization with the high density probe array.

In a third approach, sample complexity reduction involves the selectiveremoval of particular (preselected) mRNA messages. In particular, highlyexpressed mRNA messages that are not specifically probed by the probesin the high density array are preferably removed. This approach involveshybridizing the polyA⁺ mRNA with an oligonucleotide probe thatspecifically hybridizes to the preselected message close to the 3′ (polyA) end. The probe may be selected to provide high specificity and lowcross reactivity. Treatment of the hybridized message/probe complex withRNase H digests the double stranded region effectively removing thepolyA⁺ tail from the rest of the message. The sample is then treatedwith methods that specifically retain or amplify polyA⁺ RNA (e.g., anoligo dT column or (dT)n magnetic beads). Such methods will not retainor amplify the selected message(s) as they are no longer associated witha polyA⁺ tail. These highly expressed messages are effectively removedfrom the sample providing a sample that has reduced background mRNA.

II. Hybridization Array Design.

A) Probe Composition.

One of skill in the art will appreciate that an enormous number of arraydesigns are suitable for the practice of this invention. The highdensity array will typically include a number of probes thatspecifically hybridize to the nucleic acid expression of which is to bedetected. In addition, in a preferred embodiment, the array will includeone or more control probes.

1) Test Probes.

In its simplest embodiment, the high density array includes “testprobes”. These are oligonucleotides that range from about 5 to about 50nucleotides, more preferably from about 10 to about 40 nucleotides andmost preferably from about 15 to about 40 nucleotides in length. Theseoligonucleotide probes have sequences complementary to particularsubsequences of the genes whose expression they are designed to detect.Thus, the test probes are capable of specifically hybridizing to thetarget nucleic acid they are to detect.

In addition to test probes that bind the target nucleic acid(s) ofinterest, the high density array can contain a number of control probes.The control probes fall into three categories referred to herein as 1)Normalization controls; 2) Expression level controls; and 3) Mismatchcontrols.

2) Normalization Controls.

Normalization controls are oligonucleotide probes that are perfectlycomplementary to labeled reference oligonucleotides that are added tothe nucleic acid sample. The signals obtained from the normalizationcontrols after hybridization provide a control for variations inhybridization conditions, label intensity, “reading” efficiency andother factors that may cause the signal of a perfect hybridization tovary between arrays. In a preferred embodiment, signals (e.g.,fluorescence intensity) read from all other probes in the array aredivided by the signal (e.g., fluorescence intensity) from the controlprobes thereby normalizing the measurements.

Virtually any probe may serve as a normalization control. However, it isrecognized that hybridization efficiency varies with base compositionand probe length. Preferred normalization probes are selected to reflectthe average length of the other probes present in the array, however,they can be selected to cover a range of lengths. The normalizationcontrol(s) can also be selected to reflect the (average) basecomposition of the other probes in the array, however in a preferredembodiment, only one or a few normalization probes are used and they areselected such that they hybridize well (i.e. no secondary structure) anddo not match any target-specific probes.

Normalization probes can be localized at any position in the array or atmultiple positions throughout the array to control for spatial variationin hybridization efficiently. In a preferred embodiment, thenormalization controls are located at the corners or edges of the arrayas well as in the middle.

3) Expression Level Controls.

Expression level controls are probes that hybridize specifically withconstitutively expressed genes in the biological sample. Expressionlevel controls are designed to control for the overall health andmetabolic activity of a cell. Examination of the covariance of anexpression level control with the expression level of the target nucleicacid indicates whether measured changes or variations in expressionlevel of a gene is due to changes in transcription rate of that gene orto general variations in health of the cell. Thus, for example, when acell is in poor health or lacking a critical metabolite the expressionlevels of both an active target gene and a constitutively expressed geneare expected to decrease. The converse is also true. Thus where theexpression levels of both an expression level control and the targetgene appear to both decrease or to both increase, the change may beattributed to changes in the metabolic activity of the cell as a whole,not to differential expression of the target gene in question.Conversely, where the expression levels of the target gene and theexpression level control do not covary, the variation in the expressionlevel of the target gene is attributed to differences in regulation ofthat gene and not to overall variations in the metabolic activity of thecell.

Virtually any constitutively expressed gene provides a suitable targetfor expression level controls. Typically expression level control probeshave sequences complementary to subsequences of constitutively expressed“housekeeping genes” including, but not limited to the β-actin gene, thetransferrin receptor gene, the GAPDH gene, and the like.

4) Mismatch Controls.

Mismatch controls may also be provided for the probes to the targetgenes, for expression level controls or for normalization controls.Mismatch controls are oligonucleotide probes identical to theircorresponding test or control probes except for the presence of one ormore mismatched bases. A mismatched base is a base selected so that itis not complementary to the corresponding base in the target sequence towhich the probe would otherwise specifically hybridize. One or moremismatches are selected such that under appropriate hybridizationconditions (e.g. stringent conditions) the test or control probe wouldbe expected to hybridize with its target sequence, but the mismatchprobe would not hybridize (or would hybridize to a significantly lesserextent). Preferred mismatch probes contain a central mismatch. Thus, forexample, where a probe is a 20 mer, a corresponding mismatch probe willhave the identical sequence except for a single base mismatch (e.g.,substituting a G, a C or a T for an A) at any of positions 6 through 14(the central mismatch).

Mismatch probes thus provide a control for non-specific binding orcross-hybridization to a nucleic acid in the sample other than thetarget to which the probe is directed. Mismatch probes thus indicatewhether a hybridization is specific or not. For example, if the targetis present the perfect match probes should be consistently brighter thanthe mismatch probes. In addition, if all central mismatches are present,the mismatch probes can be used to detect a mutation. Finally, it wasalso a discovery of the present invention that the difference inintensity between the perfect match and the mismatch probe (I(PM)-I(MM))provides a good measure of the concentration of the hybridized material.

5) Sample Preparation/Amplification Controls.

The high density array may also include sample preparation/amplificationcontrol probes. These are probes that are complementary to subsequencesof control genes selected because they do not normally occur in thenucleic acids of the particular biological sample being assayed.Suitable sample preparation/amplification control probes include, forexample, probes to bacterial genes (e.g., Bio B) where the sample inquestion is a biological from a eukaryote.

The RNA sample is then spiked with a known amount of the nucleic acid towhich the sample preparation/amplification (control probe is directedbefore processing. Quantification of the hybridization of the samplepreparation/amplification control probe then provides a measure ofalteration in the abundance of the nucleic acids caused by processingsteps (e.g. PCR, reverse transcription, in vitro transcription, etc.).

B) “Test Probe” Selection and Optimization.

In a preferred embodiment, oligonucleotide probes in the high densityarray are selected to bind specifically to the nucleic acid target towhich they are directed with minimal non-specific binding orcross-hybridization under the particular hybridization conditionsutilized. Because the high density arrays of this invention can containin excess of 1,000,000 different probes, it is possible to provide everyprobe of a characteristic length that binds to a particular nucleic acidsequence. Thus, for example, the high density array can contain everypossible 20 mer sequence complementary to an IL-2 mRNA.

There, however, may exist 20 mer subsequences that are not unique to theIL-2 mRNA. Probes directed to these subsequences are expected to crosshybridize with occurrences of their complementary sequence in otherregions of the sample genome. Similarly, other probes simply may nothybridize effectively under the hybridization conditions (e.g., due tosecondary structure, or interactions with the substrate or otherprobes). Thus, in a preferred embodiment, the probes that show such poorspecificity or hybridization efficiency are identified and may not beincluded either in the high density array itself (e.g., duringfabrication of the array) or in the post-hybridization data analysis.

Thus, in one embodiment, this invention provides for a method ofoptimizing a probe set for detection of a particular gene. Generally,this method involves providing a high density array containing amultiplicity of probes of one or more particular length(s) that arecomplementary to subsequences of the mRNA transcribed by the targetgene. In one embodiment the high density array may contain every probeof a particular length that is complementary to a particular mRNA. Theprobes of the high density array are then hybridized with their targetnucleic acid alone and then hybridized with a high complexity, highconcentration nucleic acid sample that does not contain the targetscomplementary to the probes. Thus, for example, where the target nucleicacid is an RNA, the probes are first hybridized with their targetnucleic acid alone and then hybridized with RNA made from a cDNA library(e.g., reverse transcribed polyA⁺ mRNA) where the sense of thehybridized RNA is opposite that of the target nucleic acid (to insurethat the high complexity sample does not contain targets for theprobes). Those probes that show a strong hybridization signal with theirtarget and little or no cross-hybridization with the high complexitysample are preferred probes for use in the high density arrays of thisinvention.

The high density array may additionally contain mismatch controls foreach of the probes to be tested. In a preferred embodiment, the mismatchcontrols contain a central mismatch. Where both the mismatch control andthe target probe show high levels of hybridization (e.g., thehybridization to the mismatch is nearly equal to or greater than thehybridization to the corresponding test probe), the test probe ispreferably not used in the high density array.

In a particularly preferred embodiment, optimal probes are selectedaccording to the following method: First, as indicated above, an arrayis provided containing a multiplicity of oligonucleotide probescomplementary to subsequences of the target nucleic acid. Theoligonucleotide probes may be of a single length or may span a varietyof lengths ranging from 5 to 50 nucleotides. The high density array maycontain every probe of a particular length that is complementary to aparticular mRNA or may contain probes selected from various regions ofparticular mRNAs. For each target-specific probe the array also containsa mismatch control probe; preferably a central mismatch control probe.

The oligonucleotide array is hybridized to a sample containing targetnucleic acids having subsequences complementary to the oligonucleotideprobes and the difference in hybridization intensity between each probeand its mismatch control is determined. Only those probes where thedifference between the probe and its mismatch control exceeds athreshold hybridization intensity (e.g. preferably greater than 10% ofthe background signal intensity, more preferably greater than 20% of thebackground signal intensity and most preferably greater than 50% of thebackground signal intensity) are selected. Thus, only probes that show astrong signal compared to their mismatch control are selected.

The probe optimization procedure can optionally include a second roundof selection. In this selection, the oligonucleotide probe array ishybridized with a nucleic acid sample that is not expected to containsequences complementary to the probes. Thus, for example, where theprobes are complementary to the RNA sense strand a sample of antisenseRNA is provided. Of course, other samples could be provided such assamples from organisms or cell lines known to be lacking a particulargene, or known for not expressing a particular gene.

Only those probes where both the probe and its mismatch control showhybridization intensities below a threshold value (e.g. less than about5 times the background signal intensity, preferably equal to or lessthan about 2 times the background signal intensity, more preferablyequal to or less than about 1 times the background signal intensity, andmost preferably equal or less than about half background signalintensity) are selected. In this way probes that show minimalnon-specific binding are selected. Finally, in a preferred embodiment,the n probes (where n is the number of probes desired for each targetgene) that pass both selection criteria and have the highesthybridization intensity for each target gene are selected forincorporation into the array, or where already present in the array, forsubsequent data analysis. Of course, one of skill in the art, willappreciate that either selection criterion could be used alone forselection of probes.

III. Synthesis of High Density Arrays

Methods of forming high density arrays of oligonucleotides, peptides andother polymer sequences with a minimal number of synthetic steps areknown. The oligonucleotide analogue array can be synthesized on a solidsubstrate by a variety of methods, including, but not limited to,light-directed chemical coupling, and mechanically directed coupling.See Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCT ApplicationNo. WO 90/15070) and Fodor et al., PCT Publication Nos. WO 92/10092 andWO 93/09668 which disclose methods of forming vast arrays of peptides,oligonucleotides and other molecules using, for example, light-directedsynthesis techniques. See also, Fodor et al., Science, 251, 767-77(1991). These procedures for synthesis of polymer arrays are nowreferred to as VLSIPS™ procedures. Using the VLSIPS™ approach, oneheterogenous array of polymers is converted, through simultaneouscoupling at a number of reaction sites, into a different heterogenousarray. See, U.S. application Ser. Nos. 07/796,243 and 07/980,523.

The development of VLSIPS™ technology as described in the above-notedU.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and92/10092, is considered pioneering technology in the fields ofcombinatorial synthesis and screening of combinatorial libraries. Morerecently, patent application Ser. No. 08/082,937, filed Jun. 25, 1993describes methods for making arrays of oligonucleotide probes that canbe used to check or determine a partial or complete sequence of a targetnucleic acid and to detect the presence of a nucleic acid containing aspecific oligonucleotide sequence.

In brief, the light-directed combinatorial synthesis of oligonucleotidearrays on a glass surface proceeds using automated phosphoramiditechemistry and chip masking techniques. In one specific implementation, aglass surface is derivatized with a silane reagent containing afunctional group, e.g., a hydroxyl or amine group blocked by aphotolabile protecting group. Photolysis through a photolithogaphic maskis used selectively to expose functional groups which are then ready toreact with incoming 5′-photoprotected nucleoside phosphoramidites. Thephosphoramidites react only with those sites which are illuminated (andthus exposed by removal of the photolabile blocking group). Thus, thephosphoramidites only add to those areas selectively exposed from thepreceding step. These steps are repeated until the desired array ofsequences have been synthesized on the solid surface. Combinatorialsynthesis of different oligonucleotide analogues at different locationson the array is determined by the pattern of illumination duringsynthesis and the order of addition of coupling reagents.

In the event that an oligonucleotide analogue with a polyamide backboneis used in the VLSIPS™ procedure, it is generally inappropriate to usephosphoramidite chemistry to perform the synthetic steps, since themonomers do not attach to one another via a phosphate linkage. Instead,peptide synthetic methods are substituted. See, e.g., Pirrung et al.U.S. Pat. No. 5,143,854.

Peptide nucleic acids are commercially available from, e.g., Biosearch,Inc. (Bedford, Mass.) which comprise a polyamide backbone and the basesfound in naturally occurring nucleosides. Peptide nucleic acids arecapable of binding to nucleic acids with high specificity, and areconsidered “oligonucleotide analogues” for purposes of this disclosure.

In addition to the foregoing, additional methods which can be used togenerate an array of oligonucleotides on a single substrate aredescribed in co-pending application Ser. No. 07/980,523, filed Nov. 20,1992, and Ser. No. 07/796,243, filed Nov. 22, 1991 and in PCTPublication No. WO 93/09668. In the methods disclosed in theseapplications, reagents are delivered to the substrate by either (1)flowing within a channel defined on predefined regions or (2) “spotting”on predefined regions. However, other approaches, as well ascombinations of spotting and flowing, may be employed. In each instance,certain activated regions of the substrate are mechanically separatedfrom other regions when the monomer solutions are delivered to thevarious reaction sites.

A typical “flow channel” method applied to the compounds and librariesof the present invention can generally be described as follows. Diversepolymer sequences are synthesized at selected regions of a substrate orsolid support by forming flow channels on a surface of the substratethrough which appropriate reagents flow or in which appropriate reagentsare placed. For example, assume a monomer “A” is to be bound to thesubstrate in a first group of selected regions. If necessary, all orpart of the surface of the substrate in all or a part of the selectedregions is activated for binding by, for example, flowing appropriatereagents through all or some of the channels, or by washing the entiresubstrate with appropriate reagents. After placement of a channel blockon the surface of the substrate, a reagent having the monomer A flowsthrough or is placed in all or some of the channel(s). The channelsprovide fluid contact to the first selected regions, thereby binding themonomer A on the substrate directly or indirectly (via a spacer) in thefirst selected regions.

Thereafter, a monomer B is coupled to second selected regions, some ofwhich may be included among the first selected regions. The secondselected regions will be in fluid contact with a second flow channel(s)through translation, rotation, or replacement of the channel block onthe surface of the substrate; through opening or closing a selectedvalve; or through deposition of a layer of chemical or photoresist. Ifnecessary, a step is performed for activating at least the secondregions. Thereafter, the monomer B is flowed through or placed in thesecond flow channel(s), binding monomer B at the second selectedlocations. In this particular example, the resulting sequences bound tothe substrate at this stage of processing will be, for example, A, B,and AB. The process is repeated to form a vast array of sequences ofdesired length at known locations on the substrate.

After the substrate is activated, monomer A can be flowed through someof the channels, monomer B can be flowed through other channels, amonomer C can be flowed through still other channels, etc. In thismanner, many or all of the reaction regions are reacted with a monomerbefore the channel block must be moved or the substrate must be washedand/or reactivated. By making use of many or all of the availablereaction regions simultaneously, the number of washing and activationsteps can be minimized.

One of skill in the art will recognize that there are alternativemethods of forming channels or otherwise protecting a portion of thesurface of the substrate. For example, according to some embodiments, aprotective coating such as a hydrophilic or hydrophobic coating(depending upon the nature of the solvent) is utilized over portions ofthe substrate to be protected, sometimes in combination with materialsthat facilitate wetting by the reactant solution in other regions. Inthis manner, the flowing solutions are further prevented from passingoutside of their designated flow paths.

The “spotting” methods of preparing compounds and libraries of thepresent invention can be implemented in much the same manner as the flowchannel methods. For example, a monomer A can be delivered to andcoupled with a first group of reaction regions which have beenappropriately activated. Thereafter, a monomer B can be delivered to andreacted with a second group of activated reaction regions. Unlike theflow channel embodiments described above, reactants are delivered bydirectly depositing (rather than flowing) relatively small quantities ofthem in selected regions. In some steps, of course, the entire substratesurface can be sprayed or otherwise coated with a solution. In preferredembodiments, a dispenser moves from region to region, depositing only asmuch monomer as necessary at each stop. Typical dispensers include amicropipette to deliver the monomer solution to the substrate and arobotic system to control the position of the micropipette with respectto the substrate. In other embodiments, the dispenser includes a seriesof tubes, a manifold, an array of pipettes, or the like so that variousreagents can be delivered to the reaction regions simultaneously.

IV. Hybridization.

Nucleic acid hybridization simply involves providing a denatured probeand target nucleic acid under conditions where the probe and itscomplementary target can form stable hybrid duplexes throughcomplementary base pairing. The nucleic acids that do not form hybridduplexes are then washed away leaving the hybridized nucleic acids to bedetected, typically through detection of an attached detectable label.It is generally recognized that nucleic acids are denatured byincreasing the temperature or decreasing the salt concentration of thebuffer containing the nucleic acids. Under low stringency conditions(e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA:DNA,RNA:RNA, or RNA:DNA) will form even where the annealed sequences are notperfectly complementary. Thus specificity of hybridization is reduced atlower stringency. Conversely, at higher stringency (e.g., highertemperature or lower salt) successful hybridization requires fewermismatches.

One of skill in the art will appreciate that hybridization conditionsmay be selected to provide any degree of stringency. In a preferredembodiment, hybridization is performed at low stringency in this case in6×SSPE-T at 37° C. (0.005% Triton X-100) to ensure hybridization andthen subsequent washes are performed at higher stringency (e.g.,1×SSPE-T at 37° C.) to eliminate mismatched hybrid duplexes. Successivewashes may be performed at increasingly higher stringency (e.g., down toas low as 0.25×SSPE-T at 37° C. to 50° C.) until a desired level ofhybridization specificity is obtained. Stringency can also be increasedby addition of agents such as formamide. Hybridization specificity maybe evaluated by Comparison of hybridization to the test probes withhybridization to the various controls that can be present (e.g.,expression level control, normalization control, mismatch controls,etc.).

In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array may be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above which the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

In a preferred embodiment, background signal is reduced by the use of adetergent (e.g., C-TAB) or a blocking reagent (e.g., sperm DNA, cot-1DNA, etc.) during the hybridization to reduce non-specific binding. In aparticularly preferred embodiment, the hybridization is performed in thepresence of about 0.5 mg/ml DNA (e.g., herring sperm DNA). The use ofblocking agents in hybridization is well known to those of skill in theart (see, e.g., Chapter 8 in P. Tijssen, supra.)

The stability of duplexes formed between RNAs or DNAs are generally inthe order of RNA:RNA>RNA:DNA>DNA:DNA, in solution. Long probes havebetter duplex stability with a target, but poorer mismatchdiscrimination than shorter probes (mismatch discrimination refers tothe measured hybridization signal ratio between a perfect match probeand a single base mismatch probe). Shorter probes (e.g., 8-mers)discriminate mismatches very well, but the overall duplex stability islow.

Altering the thermal stability (T_(m)) of the duplex formed between thetarget and the probe using, e.g., known oligonucleotide analogues allowsfor optimization of duplex stability and mismatch discrimination. Oneuseful aspect of altering the T_(m) arises from the fact thatadenine-thymine (A-T) duplexes have a lower T_(m) than guanine-cytosine(G-C) duplexes, due in part to the fact that the A-T duplexes have 2hydrogen bonds per base-pair, while the G-C duplexes have 3 hydrogenbonds per base pair. In heterogeneous oligonucleotide arrays in whichthere is a non-uniform distribution of bases, it is not generallypossible to optimize hybridization for each oligonucleotide probesimultaneously. Thus, in some embodiments, it is desirable toselectively destabilize G-C duplexes and/or to increase the stability ofA-T duplexes. This can be accomplished, e.g., by substituting guanineresidues in the probes of an array which form G-C duplexes withhypoxanthine, or by substituting adenine residues in probes which formA-T duplexes with 2,6 diaminopurine or by using the salt tetramethylammonium chloride (TMACl) in place of NaCl.

Altered duplex stability conferred by using oligonucleotide analogueprobes can be ascertained by following, e.g., fluorescence signalintensity of oligonucleotide analogue arrays hybridized with a targetoligonucleotide over time. The data allow optimization of specifichybridization conditions at, e.g., room temperature (for simplifieddiagnostic applications in the future).

Another way of verifying altered duplex stability is by following thesignal intensity generated upon hybridization with time. Previousexperiments using DNA targets and DNA chips have shown that signalintensity increases with time, and that the more stable duplexesgenerate higher signal intensities faster than less stable duplexes. Thesignals reach a plateau or “saturate” after a certain amount of time dueto all of the binding sites becoming occupied. These data allow foroptimization of hybridization, and determination of the best conditionsat a specified temperature.

Methods of optimizing hybridization conditions are well known to thoseof skill in the art (see, e.g., Laboratory Techniques in Biochemistryand Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes,P. Tijssen, ed. Elsevier, N.Y., (1993)).

V. Signal Detection.

Means of detecting labeled target (sample) nucleic acids hybridized tothe probes of the high density array are known to those of skill in theart. Thus, for example, where a colorimetric label is used, simplevisualization of the label is sufficient. Where a radioactive labeledprobe is used, detection of the radiation (e.g with photographic film ora solid state detector) is sufficient.

In a preferred embodiment, however, the target nucleic acids are labeledwith a fluorescent label and the localization of the label on the probearray is accomplished with fluorescent microscopy. The hybridized arrayis excited with a light source at the excitation wavelength of theparticular fluorescent label and the resulting fluorescence at theemission wavelength is detected. In a particularly preferred embodiment,the excitation light source is a laser appropriate for the excitation ofthe fluorescent label.

The confocal microscope may be automated with a computer-controlledstage to automatically scan the entire high density array. Similarly,the microscope may be equipped with a phototransducer (e.g., aphotomultiplier, a solid state array, a ccd camera, etc.) attached to anautomated data acquisition system to automatically record thefluorescence signal produced by hybridization to each oligonucleotideprobe on the array. Such automated systems are described at length inU.S. Pat. No. 5,143,854, PCT Application 20 92/10092, and copending U.S.Ser. No. 08/195,889 filed on Feb. 10, 1994. Use of laser illumination inconjunction with automated confocal microscopy for signal detectionpermits detection at a resolution of better than about 100 μm, morepreferably better than about 50 μm, and most preferably better thanabout 25 μm.

VI. Signal Evaluation.

One of skill in the art will appreciate that methods for evaluating thehybridization results vary with the nature of the specific probe nucleicacids used as well as the controls provided. In the simplest embodiment,simple quantification of the fluorescence intensity for each probe isdetermined. This is accomplished simply by measuring probe signalstrength at each location (representing a different probe) on the highdensity array (e.g., where the label is a fluorescent label, detectionof the amount of florescence (intensity) produced by a fixed excitationillumination at each location on the array). Comparison of the absoluteintensities of an array hybridized to nucleic acids from a “test” samplewith intensities produced by a “control” sample provides a measure ofthe relative expression of the nucleic acids that hybridize to each ofthe probes.

One of skill in the art, however, will appreciate that hybridizationsignals will vary in strength with efficiency of hybridization, theamount of label on the sample nucleic acid and the amount of theparticular nucleic acid in the sample. Typically nucleic acids presentat very low levels (e.g., <1 pM) will show a very weak signal. At somelow level of concentration, the signal becomes virtuallyindistinguishable from background. In evaluating the hybridization data,a threshold intensity value may be selected below which a signal is notcounted as being essentially indistinguishable from background.

Where it is desirable to detect nucleic acids expressed at lower levels,a lower threshold is chosen. Conversely, where only high expressionlevels are to be evaluated a higher threshold level is selected. In apreferred embodiment, a suitable threshold is about 10% above that ofthe average background signal.

In addition, the provision of appropriate controls permits a moredetailed analysis that controls for variations in hybridizationconditions, cell health, non-specific binding and the like. Thus, forexample, in a preferred embodiment, the hybridization array is providedwith normalization controls as described above in Section II.A.2. Thesenormalization controls are probes complementary to control sequencesadded in a known concentration to the sample. Where the overallhybridization conditions are poor, the normalization controls will showa smaller signal reflecting reduced hybridization. Conversely, wherehybridization conditions are good, the normalization controls winprovide a higher signal reflecting the improved hybridization.Normalization of the signal derived from other probes in the array tothe normalization controls thus provides a control for variations inhybridization conditions. Typically, normalization is accomplished bydividing the measured signal from the other probes in the array by theaverage signal produced by the normalization controls. Normalization mayalso include correction for variations due to sample preparation andamplification. Such normalization may be accomplished by dividing themeasured signal by the average signal from the samplepreparation/amplfication control probes (e.g., the Bio B probes). Theresulting values may be multiplied by a constant value to scale theresults.

As indicated above, the high density array can include mismatchcontrols. In a preferred embodiment, there is a mismatch control havinga central mismatch for every probe (except the normalization controls)in the array. It is expected that after washing in stringent conditions,where a perfect match would be expected to hybridize to the probe, butnot to the mismatch, the signal from the mismatch controls should onlyreflect non-specific binding or the presence in the sample of a nucleicacid that hybridizes with the mismatch. Where both the probe in questionand its corresponding mismatch control both show high signals, or themismatch shows a higher signal than its corresponding test probe, thereis a problem with the hybridization and the signal from those probes isignored. The difference in hybridization signal intensity between thetarget specific probe and its corresponding mismatch control is ameasure of the discrimination of the target-specific probe. Thus, in apreferred embodiment, the signal of the mismatch probe is subtractedfrom the signal from its corresponding test probe to provide a measureof the signal due to specific binding of the test probe.

The concentration of a particular sequence can then be determined bymeasuring the signal intensity of each of the probes that bindspecifically to that gene and normalizing to the normalization controls.Where the signal from the probes is greater than the mismatch, themismatch is subtracted. Where the mismatch intensity is equal to orgreater than its corresponding test probe, the signal is ignored. Theexpression level of a particular gene can then be scored by the numberof positive signals (either absolute or above a threshold value), theintensity of the positive signals (either absolute or above a selectedthreshold value), or a combination of both metrics (e.g., a weightedaverage).

It is a surprising discovery of this invention, that normalizationcontrols are often unnecessary for useful quantification of ahybridization signal. Thus, where optimal probes have been identified inthe two step selection process as described above, in Section II.B., theaverage hybridization signal produced by the selected optimal probesprovides a good quantified measure of the concentration of hybridizednucleic acid.

VII. Monitoring Expression Levels

As indicated above, the methods of this invention may be used to monitorexpression levels of a gene in a wide variety of contexts. For example,where the effects of a drug on gene expression is to be determined thedrug will be administered to an organism, a tissue sample, or a cell.Nucleic acids from the tissue sample, cell, or a biological sample fromthe organism and from an untreated organism tissue sample or cell areisolated as described above, hybridized to a high density probe arraycontaining probes directed to the gene of interest and the expressionlevels of that gene are determined as described above.

Similarly, where the expression levels of a disease marker (e.g., P53,RTK, or HER2) are to be detected (e.g., for the diagnosis of apathological condition in a patient), comparison of the expressionlevels of the disease marker in the sample to disease markers from ahealthy organism will reveal any deviations in the expression levels ofthe marker in the test sample as compared to the healthy sample.Correlation of such deviations with a pathological condition provides adiagnostic assay for that condition.

EXAMPLES

The following examples are offered to illustrate, but not to limit thepresent invention.

Example 1 Detection of the Expression Levels of Target Genes

Experiments were designed to evalutate the specificity of hybridization,the relationship between hybridization signal and concentration oftarget nucleic acid, and the quantifiability of RNA detection at lowconcentration levels. These experiments involved hybridizing labeled RNAfrom a number of preselected genes (IL-2, IL-3, IL-4, IL-6, IL-10,IL-12p40, GM-CSF, IFN-γ, TNF-α, mCTLA8, β-actin, GAPDH, IL-11 receptor,and Bio B) to a high density oligonucleotide probe array comprising alarge number of probes complementary to subsequences of these genes(see, Section B, below for a description of the array) in the presenceor absence of an RNA sample transcribed from a cDNA library. The targetgenes were hybridized to the high density probe array eitherindividually, together, or individually or together in the presence oflabeled RNA transcribed from a murine cDNA library as described below.

A) Preparation of Labeled RNA.

1) From Each of the Preselected Genes.

Fourteen genes (IL-2, IL-3, Il-4, IL-6, Il-10, IL-12p40, GM-CSF, IFN-γ,TNF-α, CTLA8, β-actin, GAPDH, IL-11 receptor, and Bio B) were eachcloned into the p Bluescript II KS (+) phagemid (Stratagene, La Jolla,Calif., USA). The orientation of the insert was such that T3 RNApolymerase gave sense transcripts and T7 polymerase gave antisense RNA.

In vitro transcription was done with cut templates in a manner like thatdescribed by Melton et al., Nucleic Acids Research, 12: 7035-7056(1984). A typical in vitro transcription reaction used 5 μg DNAtemplate, a buffer such as that included in Ambion's Maxiscript in vitroTranscription Kit (Ambion Inc., Huston, Tex., USA) and GTP (3 mM), ATP(1.5 mM), UTP and fluoresceinated UTP (3 mM total, UTP: F1-UTP 1:1) andCTP and fluoresceinated CTP (2 mM total, CTP: FI-CTP, 3:1). Reactionsdone in the Ambion buffer had 20 mM DTT and RNase inhibitor. The T7polymerase was a high concentration polymerase (activity about 2500units/μL) available from Epicentre Technologies, Madison, Wis., USA. Thereaction was run from 1.5 to about 8 hours.

The nucleotide triphosphates were removed using a microcon-100 orPharmacia microspin S-200 column. The labeled RNA was then fragmented ina pH 8.1 Tris-HCl buffer containing 30 mM Mg(OAc)₂ at 94° C. for 30 to40 minutes depending on the length of the RNA transcript.

2) From cDNA Libraries.

Labeled RNA was produced from one of two murine cell lines; T10, a Bcell plasmacytoma which was known not to express the genes (exceptIL-10, actin and GAPDH) used as target genes in this study, and 2D6, anIL-12 growth dependent T cell line (Th₁ subtype) that is known toexpress most of the genes used as target genes in this study. Thus, RNAderived from the T10 cell line provided a good total RNA baselinemixture suitable for spiking with known quantities of RNA from theparticular target genes. In contrast, mRNA derived from the 2D6 cellline provided a good positive control providing typical endogenouslytranscribed amounts of the RNA from the target genes, To produce the T10cDNA library, cDNA was directionally cloned into μSHlox-1 (GibcoBRL,Gaithersburg, Md., USA) at EcoRi/HInd III to give a phage library. Thephage library was converted to a plasmid library using “automaticCre-loxP plasmid subcloning according to the method of Palazolo, et al.,Gene, 88: 25-36 (1990). After this the DNA was linearized with Not I andT7 polymerase was used to generate labeled T10 RNA in an in vitrotranscription reaction as described above.

Labeled 2D6 mRNA was produced by directionally cloning the 2D6 cDNA withαZipLox, NotI-SalI arms available from GibcoBRL in a manner similar toT10. The linearized pZ11 library was transcribed with T7 to generatesense RNA as described above.

B) High Density Array Preparation

A high density array of 20 mer oligonucleotide probes was produced usingVLSIPS technology. The high density array included the oligonucleotideprobes as listed in Table 1. A central mismatch control probe wasprovided for each gene-specific probe resulting in a high density arraycontaining over 16,000 different oligonucleotide probes. TABLE 1 Highdensity array design. For every probe there was also a mismatch controlhaving a central 1 base mismatch. Probe Type Target Nucleic Acid Numberof Probes Test Probes: IL-2 691 IL-3 751 IL-4 361 IL-6 691 IL-1() 481IL-12p40 911 GM-CSF 661 IFN-γ 991 TNF-α 641 mCTLA8 391 IL-11 receptor158 House Keeping Genes: GAPDH 388 β-actin 669 Bacterial gene (sampleBio B 286 preparation/amplification control)The high density array was synthesized on a planar glass slide.

C) Hybridization Conditions.

The RNA transcribed from cDNA was then hybridized to the high densityoligonucleotide probe array at low stringency (e.g., in 6×SSPE-T with0.5 mg/ml unlabeled, degraded herring sperm DNA as a blocking agent, at37° C. for 18 hours). The hybridized arrays were washed underprogressively more stringent conditions, (e.g., in 1×SSPE-T at 37° C.for 7 minutes down to 0.25×SSPE-T overnight) with the hybridized arraybeing read by a laser-illuminated scanning confocal fluorescencemicroscope between washes.

It was discovered that the excess RNA in the sample frequently bound upthe high density array probes and/or targets and apparently preventedthe probes from specifically binding with their intended target. Thisproblem was obviated by hybridizing at temperatures over 30° C. and/oradding CTAB (cetyltrimethylammonium bromide) a detergent.

D) Optimization of Probe Selection

In order to optimize probe selection for each of the target genes, thehigh density array of oligonucleotide probes was hybridized with themixture of labeled RNAs transcribed from each of the target genes.Fluorescence intensity at each location on the high density array wasdetermined by scanning the high density array with a laser illuminatedscanning confocal fluorescence microscope connected to a dataacquisition system.

Probes were then selected for further data analysis in a two-stepprocedure. First, in order to be counted, the difference in intensitybetween a probe and its corresponding mismatch probe had to exceed athreshold limit (50 counts, or about half background, in this case).This eliminated from consideration probes that did not hybridize welland probes for which the mismatch control hybridizes at an intensitycomparable to the perfect match.

The high density array was hybridized to a labeled RNA sample which, inprinciple, contains none of the sequences on the high density array. Inthis case, the oligonucleotide probes were chosen to be complementary tothe sense RNA. Thus, an anti-sense RNA population should have beenincapable of hybridizing to any of the probes on the array. Where eithera probe or its mismatch showed a signal above a threshold value (100counts above background) it was not included in subsequent analysis.

Then, the signal for a particular gene was counted as the averagedifference (perfect match-mismatch control) for the selected probes foreach gene.

D) Interpretation of Results.

1) Specificity of Hybridization

In order to evaluate the specificity of hybridization, the high densityarray described above was hybridized with 50 pM of the RNA sense strandof IL-2, IL-3, IL-4, IL-6, Actin, GAPDH and Bio B or IL-10, IL-12p40,GM-CSF, IFN-γ, TNF-α, mCTLA8 and Bio B. The hybridized array showedstrong specific signals for each of the test target nucleic acids withminimal cross hybridization.

2) Relationship between Target Cocentration and Hybridization Signal

In order to evaluate the relationship between hybridization signal andtarget probe concentration, hybridization intensity was measured as afunction of concentration of the RNAs for one or more of the targetgenes. FIG. 1 shows the results of this experiment. Graphs A and B areplots of the hybridization intensity of high concentrations (50 pM to 10nM) of IL-4 hybridized to the array for 90 minutes at 22° C. Plot Bmerely expands the ordinate of plot A to show the low concentrationvalues. In both plots, the hybridization signal increases with targetconcentration and the signal level is proportional to the RNAconcentration between 50 pM and 1 nM.

Graphs C and D are plots of the average hybridization intensitydifferences of the 1000 most intense probes when the array ishybridized, for 15 hours at 37° C., to a mixture of 0.5 pM to 20 pM eachof labeled RNA from IL-2, IL-3, IL-4, IL-6, IL-10, GM-CSF, IFNγ, TNF-α,mCTLA8, β-actin, GAPDH, and Bio B. Even a signal, in effect, averagedacross 13 different target RNAS, shows an intensity proportional totarget RNA concentration. Again, Graph D expands the ordinate of plot Ato show the low concentration signal.

At high target nucleic acid concentration, the hybridization time couldbe decreased, while at lower target nucleic acid concentration, thehybridization time should be increased. By varying hybridization time,it is possible to obtain a substantially linear relationship betweentarget RNA concentration and hybridization intensity for a wide range oftarget RNA concentrations.

3) Detection of Gene Expression Levels in a Complex Target Sample.

In order to evaluate the ability of the high density array describedabove to measure variations in expression levels of the target genes,hybridization was performed with the T10 murine library RNA, the libraryspiked with 10 pM each of mCTLA8, IL-6, IL-3, IFN-γ, and IL-12 and 50 pMof each of these RNA transcripts prepared as described above.

Because simply spiking the RNA mixture with the selected target genesand then immediately hybridizing might provide an artificially elevatedreading relative to the rest of the mixture, the spiked sample wastreated to a series of procedures to mitigate differences between thelibrary RNA and the added RNA. Thus the “spike” was added to the samplewhich was then heated to 37° C. and annealed. The sample was thenfrozen, thawed, boiled for 5 minutes, cooled on ice and allowed toreturn to room temperature before performing the hybridization.

The sample was then hybridized at low stringency and washed atprogressively higher stringency as described above. The best probes foreach target gene were selected as described above, in Section D, and theaverage intensity of the difference (perfect match-mismatch) of theprobes for each target gene is plotted in FIGS. 2 and 3.

A 50 pM spike represents a target mRNA concentration of about 1 in24,000, while a 10 pM spike represents a target mRNA concentration ofabout 1 in 120,000. As illustrated in FIG. 2, the high density arrayeasily resolves and quantifies the relative expression levels of each ofthe target genes in one simultaneous hybridization. Moreover, therelative expression level is quantifiable with a 5 fold difference inconcentration of the target mRNA resulting in a 3 to 6 fold differencein hybridization intensity for the five spiked targets.

FIG. 3 replots FIG. 2 on a condensed scale so that the expression levelsof constitutively expressed GAPDH and Actin and the level of IL-10 whichis endogenously expressed by the cell line, is visible. It is notablethat the single hybridization to the array resolved expression levelsvarying from 1 in 1000 for GAPDH to 1 in 124,000 for the spiked mRNAswithout the high concentration RNA (the RNA library) overwhelming thesignal from the genes expressed at low levels (e.g., IL-10).

It is also worthy of note that the endogenous (intrinsic) IL-10 wastranscribed at a level comparable to or lower than the spiked RNAs (seeFIG. 2) and the method thus is capable of quantifying the levels oftranscription of genes that are transcribed at physiologically realisticlevels.

The method described herein thus easily quantifies changes in RNAconcentrations of 5 to 10 fold. Detection is highly specific andquantitative at levels as low as 1 in 120,000. The sensitivity andspecificity is sufficient to detect low concentration RNAs (comparableto about 20 to 30 per cell) in the presence of total mammalian cellmessage populations. Other experiments have detected concentrations aslow as 1 in 300,000, comparable to about 10 RNAs per cell and the methodclearly provides a means for simultaneously screening transcriptionlevels of literally hundreds of genes simultaneously in a complex RNApool.

It is understood that the examples and embodiments described herein arefor illustrative purposes only and that various modifications or changesin light thereof will be suggested to persons skilled in the art and areto be included within the spirit and purview of this application andscope of the appended claims. All publications, patents, and patentapplications cited herein are hereby incorporated by reference for allpurposes.

1. A method of simultaneously monitoring the expression of amultiplicity of genes, said method comprising: (a) providing a pool oftarget nucleic acids comprising RNA transcripts of one or more of saidgenes, or nucleic acids derived from said RNA transcripts; (b)hybridizing said pool of nucleic acids to an array of oligonucleotideprobes immobilized on a surface, said array comprising more than 100different oligonucleotides wherein each different oligonucleotide islocalized in a predetermined region of said surface, the density of saiddifferent oligonucleotides is greater than about 60 differentoligonucleotides per 1 cm2, and said oligonucleotide probes arecomplementary to said RNA transcripts or said nuceic acids derived fromsaid RNA transcripts; and (c) quantifying the hybridization of saidnucleic acids to said array. 2-54. (canceled)
 55. A method forevaluating whether a sample of at least one labeled target biomoleculeis suitable for use in a biopolymeric array assay, said methodcomprising: (a) providing a substrate having an array thereon; (b)contacting said array with a volume that does not exceed about 5 μl ofsaid sample; (c) detecting any resultant surface bound targetbiomolecules to obtain signal data; and (d) processing said signal datato evaluate whether said sample of at least one detectably labeledtarget biomolecule is suitable for use in a biopolymeric array assay.56. The method according to claim 55, wherein said sample volume rangesfrom about 1 μl to about 5 μl.
 57. The method according to claim 55,wherein said conditions comprise incubating said array with said samplefor a period of time that does not exceed about 4 hours.
 58. The methodaccording to claim 57, wherein said period of time ranges from about 1hour to about 3 hours.
 59. The method according to claim 55, whereinsaid at least one target biomolecule is a nucleic acid.
 60. The methodaccording to claim 55, wherein said substrate comprises a plurality ofarrays and said method further comprises contacting each of saidplurality of arrays with a sample comprising at least one labeled targetbiomolecule to simultaneously evaluate each sample of said plurality ofsamples.
 61. The method according to claim 60, wherein at least two ofsaid plurality of arrays are different.
 62. The method according toclaim 60, wherein each of said plurality of samples is the same.
 63. Themethod according to claim 60, wherein at least two samples of saidplurality of samples are different.
 64. The method according to claim55, wherein said at least one labeled target biomolecule is afluorescently labeled target biomolecule.
 65. A method for evaluatingthe quality a sample of at least one detectably-labeled targetbiomolecule as suitable for use in a biopolymeric array assay, saidmethod comprising: (a) providing a substrate having an array thereon;(b) contacting said array with said sample; (c) incubating said arrayand said sample for a period of time that does not exceed about 4 hours;(d) detecting any resultant surface bound detectably-labeled targetbiomolecules to obtain signal data; and (e) processing said signal datato evaluate whether said sample of at least one detectably labeledtarget biomolecule is suitable for use in a biopolymeric array assay.66. The method according to claim 65, wherein said period of time rangesfrom about 1 hour to about 3 hours.
 67. The method according to claim65, wherein a quantity of said sample that does not exceed about 5 μl iscontacted to said array.
 68. The method according to claim 67, whereinsaid quantity ranges in size from about 1 μl to about 5 μl.
 69. Themethod according to claim 65, wherein said at least one labeled targetbiomolecule is a nucleic acid.
 70. The method according to claim 65,wherein said substrate comprises a plurality of arrays and said methodfurther comprises contacting each of said plurality of arrays with aplurality of samples, respectively, each sample comprising at least onedetectably-labeled target biomolecule, to simultaneously evaluate eachsample of said plurality of samples.
 71. The method according to claim70, wherein at least two of said plurality of arrays are different. 72.The method according to claim 70, wherein each of said plurality ofsamples is the same.
 73. The method according to claim 70, wherein atleast two samples of said plurality of samples are different.
 74. Themethod according to claim 65, wherein said at least one labeled targetbiomolecule is a fluorescently labeled target biomolecule.
 75. A methodof performing a biopolymeric array assay, said method comprising: (a)evaluating whether a sample of at least one detectably labeled targetbiomolecule is suitable for use in an array assay according to themethod of claim 55; and (b) performing an array assay with saidevaluated sample.
 76. The method according to claim 75, furthercomprising reading the result of said array assay.
 77. A methodcomprising forwarding data representing a result of a reading obtainedby the method of claim
 75. 78. The method according to claim 77, whereinsaid data is transmitted to a remote location.
 79. A method comprisingreceiving data representing a result of a reading obtained by the methodof claim
 75. 80. A device for evaluating the quality of a sample of atleast one labeled target biomolecule as suitable for use in abiopolymeric array assay, said device comprising: a substrate having atleast one evaluation array thereon, wherein said at least one evaluationarray is configured to evaluate said sample using a volume of saidsample that does not exceed about 5 μl.
 81. The device according toclaim 80, wherein said evaluation array comprises from about 4 to about1000 features of probe molecules.
 82. The device according to claim 81,wherein about 1,000 to about 100,000 probe molecules are present in eachfeature.
 83. The device according to claim 81, wherein at least some ofsaid features comprise probes of repetitive sequences.
 84. The deviceaccording to claim 80, wherein at lease one evaluation array comprisestiled probes.
 85. The device according to claim 80, wherein saidsubstrate comprises a plurality of evaluation arrays.
 86. The deviceaccording to claim 85, wherein at least two of said plurality ofevaluation arrays are different.
 87. The device according to claim 80,wherein said at least one evaluation array includes probes to relativelywell-conserved genes between at least two species.
 88. A kit forevaluating the quality of a sample of at least one labeled targetbiomolecule as suitable for use in a biopolymeric array assay, said kitcomprising: (a) at least one array; and (b) instructions for using saidat least one evaluation array to evaluate the whether a sample issuitable for use in a biopolymeric array assay according to the methodof claim
 1. 89. The kit according to claim 88, further includingcomponents for labeling said sample with a detectable label.
 90. Amethod for evaluating whether a sample of at least one labeled targetbiomolecule is suitable for use in a biopolymeric array assay, saidmethod comprising: (a) providing a substrate having an array thereon;(b) contacting said array with a volume that does not exceed about 10 μlof said sample; (c) detecting any resultant surface bound targetbiomolecules to obtain signal data; and (d) processing said signal datato evaluate whether said sample of at least one detectably labeledtarget biomolecule is suitable for use in a biopolymeric array assay.91. A method according to claim 90, wherein said conditions compriseincubating said array with said sample for 90 minutes.
 92. A method forevaluating the quality a sample of at least one detectably-labeledtarget biomolecule as suitable for use in a biopolymeric array assay,said method comprising: (a) providing a substrate having an arraythereon; (b) contacting said array with said sample; (c) incubating saidarray and said sample for a period of time is 90 minutes; (d) detectingany resultant surface bound detectably-labeled target biomolecules toobtain signal data; and (e) processing said signal data to evaluatewhether said sample of at least one detectably labeled targetbiomolecule is suitable for use in a biopolymeric array assay.
 93. Amethod according to claim 92, wherein a quantity of said sample thatdoes not exceed about 10 μl is contacted to said array.
 94. A device forevaluating the quality of a sample of at least one labeled targetbiomolecule as suitable for use in a biopolymeric array assay, saiddevice comprising: a substrate having at least one evaluation arraythereon, wherein said at least one evaluation array is configured toevaluate said sample using a volume of said sample that does not exceedabout 10 μl.